Gravitational Wave Tests of Strong Field General Relativity with Binary Inspirals: 
Realistic Injections and Optimal Model Selection 
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We study generic tests of strong-field General Relativity using gravitational waves emitted during 
the inspiral of compact binaries. Previous studies have considered simple extensions to the standard 
post-Newtonian waveforms that differ by a single term in the phase. Here we improve on these 
studies by (i) increasing the realism of injections and (ii) determining the optimal waveform families 
for detecting and characterizing such signals. We construct waveforms that deviate from those 
in General Relativity through a series of post-Newtonian terms, and find that these higher-order 
terms can affect our ability to test General Relativity, in some cases by making it easier to detect 
a deviation, and in some cases by making it more difficult. We find that simple single-phase post- 
Einsteinian waveforms are sufficient for detecting deviations from General Relativity, and there is 
little to be gained from using more complicated models with multiple phase terms. The results 
found here will help guide future attempts to test General Relativity with advanced ground-based 
detectors. 
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I. INTRODUCTION 

Einstein's General theory of Relativity (GR) has 
weathered an array of increasingly stringent tests since 
the theory first gained prominence in November 1919, 
when reports of Eddington's expedition appeared in 
newspapers around the world: "Revolution in science 
- New theory of the Universe - Newtonian ideas over- 
thrown". Subsequent observations have continued to 
strengthen the case for Einstein's theory, though ob- 
servations have yet to probe the dynamical, non-linear 
regime where the most revolutionary aspects of the the- 
ory take hold. For example, GR has passed all Solar 
System tests with flying colors, but these are based on 
stationary, weak, and linear gravitational fields, where 
characteristic velocities are small relative to the speed of 
light [l[. The theory has also passed all binary pulsar 
tests, but these systems have gravitational fields that are 
quasi-stationary and only moderately-strong, with char- 
acteristic velocities of ~ 0.1% the speed of light [HQ- I n 
the near future, gravitational wave (GW) observations 
will test GR in a regime that has so-far evaded observa- 
tion: the strong- field , where the gravitational field is of 
order unity and velocities approach the speed of light. 

Compact binary coalescences, the slow inspiral and 
merger of black holes (BHs) and/or neutron stars (NSs), 
will be strong sources of GWs, and these will be excellent 
tools for testing GR. During the inspiral phase, the bi- 
nary components have orbital velocities ranging from 1% 
to ~ 50% the speed of light, which leads to strong and 
dynamically evolving gravitational fields. These GW sig- 
nals evolve through thousands of radians of phase in the 
most sensitive band of ground-based detectors, such as 
aLIGO and aVIRGO, with signal-to-noise ratios (SNRs) 
that will allow us to extract signal parameters with good 
accuracy. Thus, even small differences in the dynamics 
of the gravitational theory can lead to large accumulated 



effects in the waveform during the inspiral. 

Despite their promise, GW tests of GR are, unfortu- 
nately, very difficult to carry out, for two main reasons. 
One reason is purely theoretical - we currently lack candi- 
date alternative theories that are particularly appealing. 
Instead, we have many models that are either heavily 
constrained, like scalar-tensor theories [l|, or that have 
theoretical issues, such as knowledge only of their effec- 
tive, low-curvature form The other cause of diffi- 
culty lies in the data analysis. Most techniques for de- 
tecting and characterizing GW observations require ac- 
curate templates to identify weak signals buried in the 
instrument noise. Given the already large parameter di- 
mensionality of the GR waveform models, and the wide 
variety of modified gravity theories [3l-fl9|. the construc- 
tion of individual template banks for all possible non-GR 
models is simply not feasible. 

A much more appealing alternative is to devise a 
generic non-GR template family with which to model 
the signals, and allow the data to select the appropriate 
model via Baycsian inference. The first such model was 
proposed by Arun et al (20| - [22| , where the coefficients in 
the post-Newtonian (PN) expansion of the phase were in- 
dependently fitted for. However, the structure of the PN 
series does not allow for all known modified gravity devi- 
ations, including potentially interesting ones such as the 
emission of dipolar radiation predicted in scalar-tensor 
theories. For this reason, Yunes and Pretorius [23| devel- 
oped the so-called parameterized post-Einsteinian (ppE) 
framework, which allows for a wide range of deforma- 
tions to the amplitude and phase of the waveform. In 
the inspiral phase, these can be represented through a 
polynomial in the GW frequency, with free constants, or 
ppE parameters, that represent the amplitude and the 
frequency exponent of the deformations [22|. The sim- 
plest ppE inspiral waveform in the Fourier domain has 
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the form 

h» pE = h GR (1 + au a ) e i0u " , (1) 

where u = (nAif) 1 ^ 3 is a dimcnsionless velocity, M. = 
r] 2 / 5 M is the chirp mass, rj = ffiim2/(mi + is the 

symmetric mass ratio, mi^ are the component masses, 
and / the GW frequency. The Fourier transform of the 
GR waveform is here ft, GR , while (a,a,j3,b) are ppE pa- 
rameters. Clearly, in the limit (a, (3) = (0,0), one recov- 
ers the GR prediction, while for other values of the ppE 
parameters one recovers the leading-order waveforms of 
all known modified gravity theories. 

The first data analysis implementation of the ppE 
framework was carried out by Cornish, et al [24[ , where 
ppE waveforms of the form of Eq. (JTJ) were used both in 
the generation of the simulated signals, and in the ex- 
traction of the model parameters in a Bayesian model 
selection framework. This study was a proof-of-principle 
that the ppE framework can be successfully implemented 
to carry out tests of GR. A second study shortly fol- 
lowed [25[ that confirmed the results of Cornish, et al and 
extended them to include lower SNR signals and multi- 
ple detections. While this study also used the simple 
one-phase ppE model of ([TJ for the signal injections, the 
models used to analyze the simulated data included more 
complicated ppE waveform models with multiple phase 
corrections. 

In this paper we revisit the ppE framework and carry 
out a more realistic data analysis study. First, we exam- 
ine the effect of more realistic non-GR injections that in- 
clude modifications to several terms in the PN GR phase, 
instead of a single one. Generic deviations from GR will 
be characterized by an infinite number of phase correc- 
tions. Ground-based detectors will not be sensitive to all 
of them, just as they are not sensitive to GR signals to 
arbitrarily high PN order. The presence of the first few 
higher-order terms can affect our ability to test GR. We 
find that the presence of multiple phase modifications 
will improve our chances of detecting departures from 
GR if they are of the same sign. However, if the phase 
modifications are of alternating sign, they can cancel out 
to some degree, and make a non-GR signal appear to be 
well described by GR. 

As something of an aside, we consider the issue of 
adding explicit noise realizations to the simulated sig- 
nals, especially for low SNR signals. This is done because 
some concerns have been voiced about the conclusion of 
the Cornish et al [24| work due to the relatively high SNR 
of the signals used, and their technique of accounting for 
the noise solely through the weighting of the likelihood 
function. We analytically and numerically show that the 
conclusions of [24j remain unaffected when adding an ex- 
plicit noise realization. We also show that these results 
scale linearly with SNR down to values close to the de- 
tection threshold. 

We then tackle the problem of determining the opti- 
mal ppE model for detecting departures from GR. On 
the one hand, including additional phase terms will im- 



prove the fit and increase the likelihood. On the other 
hand, adding additional parameters to the model incurs 
an "Occam penalty". We find, on balance, that in al- 
most all cases, templates with only one ppE parameter 
are preferred over those with multiple parameters. These 
suggests that the simple one-parameter ppE model may 
well be the ideal one to search for GR deviations in early 
data from advanced detectors. 

The remainder of this paper is organized as follows. 
Section [IT] builds non-GR injections and studies their 
effect on signal extraction and the detection of depar- 
tures from GR. Section [TTT1 considers the effects of adding 
explicit noise realizations to the signals, and how the 
strength of the signal affects our ability to test GR. Sec- 
tion IIVI studies different ppE waveform models to deter- 
mine the optimal one for performing GW tests of GR. 
Section [V] concludes and points to future directions for 
research. Throughout this paper we use geometric units 
with G = c = 1 . 



II. REALISTIC SIGNAL INJECTIONS 

The simplest ppE waveform family presented in the 
introduction is not sufficiently complex to represent a 
realistic alternative gravity theory. This is because mod- 
ified gravity theories will differ from GR by an infinite 
series of terms in both the amplitude and the phase. We 
expect that an alternative theory of gravity will give rise 
to waveforms where the amplitude and phase depend 
on one or more fundamental coupling constants multi- 
plied by functions of the system parameters. Thus, if 
one wishes to use a ppE-type template to inject non-GR 
signals, one must consider more complex ppE models, 
such as Eq. (46) in [23| , namely Eq. (UJ) with the replace- 
ments [23| 

N N 

au a ->■ a ^ a ' > P uh -> Yl P' iubz ' ( 2 ) 
i=0 -;=o 

where the a, /?'s depend on a universal coupling constant 
k, and functions of the system parameters A: 

ou(k, A) = K^2<f>i(X) 

A(k,A) (3) 

The functions 4>i(X),8i(X) can be computed for specific 
theories, but their general form is unknown. So while n 
takes a single value for a particular theory, the (at, fit) 
constants will vary from detection to detection depending 
on the masses, spins and other parameters that describe 
the system. In some theories there will be more than one 
additional coupling constant k, but here we will assume 
that one sector of the modified theory dominates and 
consider only a single series of correction terms. With a 
large number of high SNR detections, it may be possible 
to infer the functional form of (4>i(\), 0i(X)). However, 
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since our immediate concern is in deciding if the data is 
consistent with the prediction of GR, we will argue that 
it is best to use a much simpler waveform for the initial 
tests. 

The ppE exponents (a i5 6, ) arc real numbers that give 
the effective PN order at which the non-GR modification 
enters the signal, while the ppE amplitude parameters 
(ai, Pi) are real numbers that indicate the strength of the 
modification, in turn controlled by the overall coupling 
strength n. In principle, we could extend the sum in 
Eq. ([2]) to infinity, but in practice, realistic detectors are 
sensitive only to a finite number of terms in the phase 
and amplitude. The injected signals then consist of a 
GR waveform with its amplitude and phase modified by 
a series of ppE corrections. 

Several simplifications can be made to the general 
waveform presented above. First, for quasi-circular in- 
spiral signals, Chatziioannou, et al [2r| have argued that 
analyticity demands that the exponents (ai,6,) take on 
integer values with possible logarithmic corrections (just 
as the PN expansion in GR comes in integer powers of 
u and products of integer powers of u with logw, where 
recall here that u is related to the orbital velocity). Sec- 
ond, ground-based advanced detectors will be of limited 
sensitivity, rarely being sensitive to more than the first 
three terms in the PN expansion, and usually being much 
more sensitive to the phase evolution than they are to 
the amplitude evolution. Thus, we choose to simplify 
the analyses by truncating the sum at three terms and 
setting ai = 0. The injections are then given by Eq. ([T]) 
but with the replacement 



b+i 



I3 b u° + f3 b+1 u b+1 + p b+2 u 



,6+2 



(4) 



i=0 



and a = 0, where in the last equality the Einstein sum- 
mation convention is not assumed. Written in this way, 
flb is always proportional to u b for any b. Previous in- 
vestigations have been restricted to signals with only one 
ppE correction injected, which reduces to Eq. (QJ when 
one retains only the first term in the sum. As argued 
above, this is far from realistic for a modified gravity in- 
jection and we will show that the higher-order terms can 
have a significant effect on the analysis. 

Ultimately the claim that a detection is in agreement 
(or conflict) with GR comes down to model selection. 
Does GR describe the data best or does another model 
do a better job? In Bayesian statistics [HI Uli HI! i model 
selection is performed via the calculation of the Bayes 
factor, which is simply the "betting odds" of one model 
against another. For instance, if the Bayes factor between 
GR and a non-GR model is 100, and you originally gave 
both possibilities equal odds, then there is a 100:1 odds 
ratio that GR better describes the data than the other 
model. In this case, you would be well-advised to put 
your money on GR. There is no prescription for deciding 
what Bayes factor is required before we should consider 
one model "right" and another "wrong" . However, in 



the case of a well-tested theory like GR being brought 
into question by, for instance, a GW signal, it is likely 
that the scientific community would require a detection 
that gives us a fairly high Bayes factor in favor of the 
non-GR model to overcome the prior belief in GR being 
the correct theory. In order to determine whether more 
ppE terms in an injection affect the detectability of a 
deviation from GR, we need to see how these different 
types of injections affect the Bayes factor. 

Throughout this paper, Bayes factors are calculated 
using the Savage-Dicke density ratio [27], HH and/or Re- 
versible Jump Markov Chain Monte Carlo (RJMCMC) 
[27l |30| . In the Savage-Dicke method, the Bayes factor 
between two nested models, i.e. model X and model Y 
that differ only by the addition of a parameter to model 
Y, is calculated by comparing the weight of the marginal- 
ized posterior to the weight of the prior distribution for 
the "extra" parameter at the value that this parameter 
takes on for the lower-dimensional model: 



Bxy 



p(k = 0|s) 
p(k = 0) 



(5) 



In our case, the extra parameter is the coupling strength 
K, which has the GR limit n = 0, and hence = 0. To 
calculate the Bayes factors this way, we run a MCMC 
search using ppE templates in order to generate the pos- 
terior distribution for /3j, and then calculate the posterior 
weight in this distribution at /3j = 0. We then compare 
this posterior weight to the prior density at this point. 
We here use a flat prior distribution between —5.0 and 
5.0 for all j3 values. The main advantage to this method 
over other possibilities is that it only requires exploration 
of the higher-dimensional space. 

All tests in this section use GWs emitted by a NS-NS 
binary with w 1.4M Q component masses in the inspiral 
phase with SNR ~ 12. We model all waveforms with a 
quadrupolar, adiabatically quasi-circular waveform, with 
a 3.5PN-accurate phasing, but neglecting PN amplitude 
correction and spin effects, and truncating all evolution 
at the Schwarzschild test-particle innermost stable cir- 
cular orbit. The waveforms are then described by nine 
source-parameters: the chirp and the reduced mass; the 
time and phase of coalescence; two sky-position angles; 
the inclination angle and the GW polarization angle; and 
the luminosity distance (see [24[ for a similar waveform 
prescription) . In addition to these we have the ppE phase 
parameters of Eq. ((4]). We consider a three detector net- 
work of second-generation detectors, such as aLIGO at 
Hanford, aLIGO at Livingston, and aVirgo, with identi- 
cal broadband-configuration spectral densities, as in our 
previous paper 24J, assuming the noise to be Gaussian 



and stationary. Table U shows the system parameters for 
all systems studied in this paper (masses are listed in so- 
lar masses, and luminosity distances are in megaParsecs). 

In this paper, we examine two factors that influence 
the outcome - the signs of the different phase corrections, 
and their relative magnitude. We begin by exploring the 
effects of injecting phase corrections with the same or 
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differing signs. In particular, let us study the effect that 
this relative sign has on the detectability of a non-GR 
behavior. We will then explore the difference between 
non-GR phase corrections that either shrink in magni- 
tude at higher PN order, stay at approximately the same 
magnitude, or grow in magnitude at higher PN order. 

We begin by examine how the relative sign of the phase 
corrections affects the detectability of departures from 
GR. To do this, we consider three non-GR injections: 

• Case i. A ppE waveform with a single ppE phase 
term (b = —3), with magnitude controlled by /3-3. 

• Case ii. A ppE waveform with two ppE phase 
terms (b = —3 and b = —2), with /3_3 and /3_2 of 
the same sign. 

• Case iii. A ppE waveform with two ppE phase 
terms (6 = 3 and b = +2), with /3_ 3 and /3_2 of 
different sign. 

We choose these values of b because, for b < —5, /3& is al- 
ready well-constrained by binary pulsar observations, as 
demonstrated in [24| [3l[ . Case (i) is the type of injection 
that has been explored in previous work. Cases (ii) and 
(iii) include higher-order phase corrections, but differ in 
their relative sign. 

Figure [T] shows the Bayes factors between GR and a 
one-parameter ppE template family with b = —3 and ppE 
parameter /3_3 for the three injections discussed above. 
The error bars in this figure are estimated by calculat- 
ing the Bayes factors using multiple MCMC runs with 
different random seeds. The spread in the calculated val- 
ues are reflected in the error bars. Observe that when 
the injection contains ppE corrections of the same sign 
(dotted, magenta curve), these add up to make the signal 
more discernible from GR. In this case, the Bayes factor 
crosses 10 for the smallest value of /3_3. Therefore, if 
{Pb-iPb+i) share the same sign, we can detect deviations 
from GR with lower strengths than if there were only 
one phase correction. On the other hand, observe how 
when the non-GR signal contains alternating sign GR 
modifications (dashed blue line), these have the effect of 
partially canceling the non-GR effect out. In this case, 
the Bayes factor crosses 10 for a much larger value of /3_3. 
Therefore, if the corrections have alternating signs, e.g. if 
(/3b,/3b+i) have different signs, then our ability to detect 
departures from GR is reduced. The sign of the ppE am- 
plitude exponent also affects the PDFs of the recovered 
fii parameters, as we will see below. 

The relative magnitudes of the terms also affects the 
analysis. Concentrating on the multi-term ppE models of 
Eq. (|4]), we define three cases, depending on the relative 
magnitude of these exponents in the series expansion: 




0.002 0.004 0.006 0.008 0.01 0.012 0.014 

P-3 



FIG. 1: (Color Online) Bayes factors between a GR model 
and a one-parameter ppE model for three different ppE signal 
injections. The dotted (magenta) line corresponds to an in- 
jection with the two positive ppE terms /3_3 > and /3_a > 
(case ii), the solid (red) line corresponds to the single, positive 
ppE term /3_3 > (case i), and the dashed (blue) line corre- 
sponds to the two ppE terms of alternating sign /3_3 > and 
/3-2 < (case iii). System parameters for the systems studied 
here are listed in Table [I] As expected, the signal with ppE 
terms of alternating sign is harder to distinguish from GR, 
as evidenced by its Bayes factor growing the slowest with the 
magnitude of /3-3. 



• Critical Case: Injections where the ppE terms 
remain of about the same size as the PN order in- 
creases, i.e. j3 b ~ [3 b+ i ~ l3 b+2 . 



• Asymptotic Case: Injections where the ppE 
terms get bigger as the PN order increases, i.e. fit, < 

fib+l < Pb+2- 

Obviously, there are an infinite number of ways to choose 
how large the /3, constants are relative to each other, but 
the classification defined above provides a useful sum- 
mary. More concretely, we here define convergent 1 cases 
as those where the ppE terms injected have (3 n +i < 
(wmax)" b ", where u max = TrMf ma , K . Similarly, critical 
cases are defined such that (3 n +i ~ (w max ) _ j while 
asymptotic cases have /3 n +i < (^max) 

An alternative and roughly equivalent way to define 
these three different cases is by the number of useful cy- 
cles of phase [32| that accumulate during the signal for 
each correction to the phase. The number of useful cycles 



• Convergent Case: Injections where the ppE 
terms get smaller as the PN order increases, 
i.e. (3 b > fib+i > Pb+2- 



1 We use the words "convergent," "critical" and "asymptotic" 
loosely here. These names do not necessarily imply that the 
series posses these properties. 
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Signal 


Q 


4>L 


4>c 


mi(M ) 


m 2 (M ) log(£>L)(Mpc) 


tc 


5 


0L 


0-3 


P-2 




One ppE Term 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


0.0 


o 


Alternating Sign 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


-0.04 





Same Sign 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


0.04 





Convergent 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


0.005 





Critical 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


0.08 





Asymptotic 


1.0 


4.76 


1.9 


1.62 


1.73 


3.96 


5.58 


0.77 


-0.43 


0.01 


0.25 





GR Source 


3.95 


4.14 


0.68 


1.45 


1.43 


0.9 


3.41 


-0.66 


0.76 












TABLE I: Source parameters for sources used in Fig. [T] (top), Fig. [2] (middle) and Figs. [3] [4] and [6] (bottom). 



is defined via 



^useful 




™df a 2 (f) d<\> 
/ S n (f) 2ndf 




df a\f) 



, ffSn(f)J 

(6) 

where \h(f)\ 2 = N(f)a 2 (f)/ f 2 is the squared modu- 
lus of the frequency domain GW signal, and N(f) = 
(f /2n)(d<fi/df ). This quantity tells us about the phase 
accumulated from each PN (or ppE) term during the 
course of the signal, weighted by the sensitivity of the de- 
tector to different parts of frequency space. Tables of the 
number of useful cycles of phase for each system analyzed 
in this paper are included in this section. "Convergent" 
signals are those for which the number of useful cycles 
due to the non-GR phase corrections decreases at higher 
order. "Critical" signals have roughly the same number 
of useful cycles at each order. "Asymptotic" signals have 
larger numbers of useful cycles from the non-GR phase 
at higher orders. 



Signal 



Convergant 0.312 0.012 
Critical 0.312 0.194 
Asymptotic 0.312 0.607 



TABLE II: Number of useful cycles from the different injected 
ppE terms - Fig Q] and Fig [2] 

Figure [2] shows the PDFs of the recovered /3_3 param- 
eter for a one ppE parameter template family, with in- 
jections given by convergent, critical and asymptotic ver- 
sions of cases (ii) and (hi). These PDFs are computed 
using a MCMC approach. The top panel of this figure 
shows the PDFs for /3_3 given an asymptotic injection, 
the middle panel given a critical injection, and the bot- 
tom panel given a convergent injection. The left and 
right panels correspond to injections with the same (left) 
or alternating (right) signs. When there is as much or 
more weight at /3_3 = in the PDF's as there was in 
the prior probability density, this indicates that GR is 
the preferred model. In our case, the prior probability 
for /3_3 is flat between —5.0 and 5.0, and so the prior 
probability density at all points, including /3_3 = 0, is 
0.1. When the posterior density at /3_3 = is less than 
0.1, an alternative model is preferred. 



Figure[5]reveals several interesting facts. First, observe 
that in the convergent and in the asymptotic injection 
cases, the sign of the /3 s is irrelevant: in both cases most 
of the weight is outside /3_3 = 0. Second, observe that 
in the convergent case, the second ppE term (6 = —2) 
is very sub-dominant to the first term, and so its sign 
has little impact on the results. Third, observe that in 
the critical injection case, when the /3s have alternating 
signs, the modified gravity effects partially cancel out, 
yielding a /3_3 PDF with non-negligible weight at the 
GR value. It is clear from these studies that neglecting 
higher-order phase corrections can seriously bias our as- 
sessment of our ability to test GR with GW signals. For 
the "Critical" case, our ability to detect departures from 
GR is enhanced if the terms have the same sign, and 
diminished if the signs alternate. 



III. NOISE MODELING AND SIGNAL 
STRENGTH 

Most of our studies have been conducted on signals 
that do not have a noise realization explicitly added to 
the signal injection, although all analyses incorporate the 
noise spectrum of the detectors in the likelihood calcula- 
tion. We chose not to include an explicit noise realization 
in order to expedite the calculation of the likelihood [33| , 
which then allows us to produce long Markov chains that 
fully explore the high dimensional parameter spaces. Un- 
fortunately, our use of this technique has led some to 
question the reliability of our results [1^, [Hj. Here we 
show that those concerns are unfounded. 

The inclusion of noise in our signals has little effect on 
the conclusions we drew in our previous paper, as can be 
seen in Fig. [3] In this figure, we plot the (3a)-bounds 
that we could place on the ppE phase parameters, if one 
has detected a NS-NS inspiral with SNR 15 that has no 
GR deviation. To calculate these bounds, we inject a 
GR signal and try to recover it using a single parame- 
ter ppE template, ie. Eq. (01 with a single j3. For any 
given value of b, we integrate out over all other param- 
eters and take the standard deviation of the j3 PDF as 
a lcr bound. In other words, the curves show the upper 
limit of the magnitude a ppE parameter could be found 
to have, and still have the signal be consistent with GR. 
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Asymptotic - same sign 



Asymptotic - alternating signs 





-0.06 -0.04 -0.02 0.02 0.04 0.06 -0.06 -0.04 -0.02 0.02 0.04 0.06 
P-3 P-3 



Critical - same sign 



Critical - alternating signs 





-0.03 -0.02 -0.01 0.01 0.02 0.03 -0.03 -0.02 -0.01 0.01 0.02 0.03 
P-3 P-3 



Convergent - same sign 



Convergent - alternating signs 





-0.03 -0.02 -0.01 



0.01 0.02 0.03 -0.03 -0.02 -0.01 



0.01 0.02 0.03 



P-3 



FIG. 2: The PDF's for /3_3 in a one-parameter ppE template 
recovered from MCMC searches on injections containing two 
ppE parameters (b = —3 and b = —2). The plots on the left 
are for injections containing two ppE parameters of the same 
sign, and on the right of opposite signs. The more weight in 
the PDF at j3 = 0, the lower the Bayes factor in favor of a 
non-GR signal. In the critical case, we find that alternating 
signs in the phase corrections can cause a non-GR signal to 
be indistinguishable from a GR one. In the convergent and 
asymptotic cases, this does not occur. System parameters for 
this figure are the same as in Figure [TJ also listed in Tabled] 
and the useful cycles of phase are in listed in Table HH 



This plot shows that the bounds placed on the ppE pa- 
rameters from a signal that includes an explicit noise re- 
alization are consistent with those found when no noise is 
added to the signal. That is, including an explicit noise 
realization does not affect the conclusions derived from 
a cheap-bound calculation with noise accounted for only 
through the detectors' noise spectrum in the likelihood. 

To understand this result, it is useful to look at Fig- 
ure 0J which shows the recovered PDF's for the j3 pa- 
rameter from three different runs, each including noise 
generated with a different random seed. Since the in- 
jected signal was a GR NS-NS inspiral waveform with 
SNR 15, we would expect the ft PDF's to peak at zero. 




FIG. 3: (Color Online) (3<r)-bounds on that can be in- 
ferred for different values of 6, calculated from the PDF's of 
P generated by recovering a GR signal with a ppE template. 
This plot shows the bounds for both a signal with no noise, 
and three that include Gaussian noise, generated from three 
different random seeds. The results are essentially identical. 
The signal parameters for this injection are in Tabled] 



It is clear from this figure that, although the peak of the 
PDF is shifted by the inclusion of noise, the uncertainty 
in the recovery of this parameter, i.e. the spread of the 
distribution, is not affected. This concept has been ex- 
plored before, in [35[ and [HI . In (35|, the authors argue 
that when discussing our ability to measure system pa- 
rameters in general, and not for a particular case, what 
we really want to do is examine the noise- averaged uncer- 
tainties in these parameters. That is, we are interested in 
how well we can measure parameters when averaged over 
many specific realizations of the noise. The authors show 
that the noise-averaged uncertainties are the same as the 
uncertainties calculated with zero noise injected into the 
signal. In [HI it is argued that the specific noise realiza- 
tion will affect our parameter estimation, and while this 
is technically true, we have shown in this section that 
the overall effect is minimal. In any case, for the type of 
analysis that we want to do in the rest of this paper, the 
reasoning of [If| applies, and so we do not inject an ex- 
plicit noise realization for any of our analyses in the other 
sections. It has also been claimed in [25| that simulated 
data that only includes a signal injection, ie. that does 
not include a noise realizations, will necessarily lead to 
posterior distributions for the system parameters that are 
Gaussian. This is patently false, as can easily be demon- 
strated by analytically calculating the posterior distribu- 
tion for a signal of the form (do/d) cos(2tt ft), which leads 
to a highly non-Gaussian distribution in the distance d. 

Obviously, signals with high SNR will be better for 
testing GR, as they are better for any type of GW data 
analysis. When discussing how well GR can be tested us- 
ing GW detections, the highcst-SNR events are the ones 
that will lead to the strongest constraints. In our pre- 
vious paper, we analyzed signals with SNR ~ 20, which 
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FIG. 4: (Color Online) The top panel shows posterior distri- 
butions of p recovered from three ppE injections, including 
noise in the injection. Each of the three signals was gener- 
ated using a different random seed for the noise, but the same 
system parameters. The lower panel shows the same distri- 
butions, now with the best-fit value of /3 subtracted. This 
illustrates that, although noise affects the peak of the poste- 
rior distribution for a given parameter, it does not affect the 
uncertainty in that parameter. Thus the cheap bounds of [24j 
are unaffected by the inclusion of noise. 



FIG. 5: (Color Online) (3<r)-bounds on /3 for b = —1.0, calcu- 
lated from the PDF's of p generated by recovering a GR signal 
with a ppE template. This plot shows the linear relationship 
between the bounds on /3 and the SNR of the signal. There 
are four lines shown - one for a signal that had no noise in- 
jected, and three for signals that had noise injected, each with 
a different random seed. The results are essentially identical. 
The signal parameters for this injection are in Table [T] 



would be considered a high SNR detection by the LIGO 
detectors. It is irrelevant, however, that most signals will 
probably have SNRs in the low 10s. There will always be 
one signal with highest SNR, and this is likely to be above 
15. It is therefore still useful to study GR tests assuming 
detections with SNRs ~ 20, as it is not a hopeless propo- 
sition that we will have this type of event in our GW 
catalog. Throughout the rest of this paper, however, we 
have taken a more pessimistic tack, and restricted our- 
selves to analyzing signals with SNR ~ 10 — 12. The 
results follow the theoretical linear scaling with SNR [33[ 
down to values of the SNR that are close to the detec- 
tion threshold, which for this system was found to be 
SNR ~ 7.5. This scaling is shown in Figure [SJ 



A. Overfitting 



One may consider using a ppE template with many 
ppE phase and amplitude terms in the sums of Eq. 
For example, one could include as many ppE phase terms 
as there are in the GR PN series, but this is far from 
ideal. The reason is clear: if we include the same num- 
ber of free ppE parameters in our phase model as we 
have phase terms that are functions of system parame- 
ters, then there is no way to constrain any of them. In 
other words, the ppE phase terms will have a 100% cor- 
relation with the standard GR system parameters that 
form the coefficients of the GR PN phase. 



IV. OPTIMAL MODEL SELECTION 



We have seen that it is important to consider multi- 
term ppE signal injections when assessing the bounds 
we will be able to place on alternative gravity theories. 
The question still remains, however, as to what type of 
templates we should use to recover such signals. In this 
section we address this question by showing first that 
adding too many parameters to the templates is counter- 
productive. Then we determine the optimal ppE tem- 
plate family to detect departures from GR described by 
the more realistic multi-term ppE signal injection model. 



As a simple example, consider the possibility of de- 
tecting a non-GR signal that includes ppE corrections 
at b = —5 (a so-called Newtonian ppE correction) and 
b = —3 (a 1PN ppE correction). We will truncate our in- 
jection at 1PN order for this example, which implies that 
the GW phase contains two standard PN terms that are 
functions of the system parameters, and two free ppE 
terms. Figure [5] shows that there is a 100% correlation 
between these PN and ppE parameters. 

These types of correlations are commonly encountered 
in GW data analysis, but they may not be widely appre- 
ciated by theoretical model builders. We can understand 
this correlation analytically as follows. Let us write the 
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FIG. 6: Correlation between the /3_5 ppE parameter and the chirp mass (left panel) and the /3_3 parameter and the inverse 
chirp mass (right panel) for an injection including two PN phase terms as well as two ppE phase corrections. The parameters 
are restricted only by their prior ranges. 



simplified ppE template Fourier phase $ ppE (/) as follows 



*ppb(/) - 



JLfrM)- B/3 + f}- 6 {*M)- s/a 



-5/3 



1287? 2 / 5 7rA4 V 756 



/3715 55 



fl- 



it M 



(7) 

where we have expanded out the definition of u. Clearly, 
we can rescale /3_5 by a constant and /3_3 by a function 
of ?y to recover the same value of the Fourier phase, thus 
showing a direct correlation between these parameters. 
Figure [6] demonstrates how such a correlation manifests 
itself in the posterior distributions. 

This argument can be extended to whatever PN order 
we choose. If we include the same number of ppE terms 
as PN terms in our model, then we will not be able to 
place bounds on any parameter, let alone use the results 
as a test of GR. It is also true, however, that ppE models 
that include more ppE terms will be able to achieve a 
better overall fit of whatever signal we happen to detect, 
just as any model with extra parameters can typically fit 
data better than a simpler model. In the next section we 
explore the tradeoff between these two effects. We also 
attempt to determine what types of signals are best to 
analyze using more complex ppE models, and what types 
are better served with a simple ppE model. 



B. Parsimonious Fitting: Detecting and 
Characterizing non-GR signals 

Let us now study what type of ppE templates are best 
suited for detecting a GR deviation. In particular, let us 
examine whether using one-term or two-term ppE tem- 
plates works better. For this analysis, we inject ppE sig- 
nals containing three phase terms, and attempt to recover 



them using one- and two-parameter ppE templates. We 
calculate Bayes factors between the ppE models against 
the GR model to see which model is best suited to de- 
tecting departures from GR. Because of our strong prior 
belief in the validity of GR, a Bayes factor significantly 
greater than unity would be necessary to convince us that 
a new theory of gravity is needed. 

Let us then consider three different ppE injections, 
starting at 1 PN order (b = —3, —2, —1), a convergent, a 
critical and an asymptotic one, each for a NS-NS inspi- 
ral, with parameters listed in Table Hill . We explore these 
simulated signals with a MCMC algorithm, using a one- 
and a two-term ppE model. The one-term ppE models 
are allowed to choose between phase exponents b = — 3 
and b = — 2, while the two-term models are allowed to 
choose between the pairs (—3, —2) and (—2, —1) - i.e. the 
two terms must differ by a single power of u, and models 
with exponents (—3, —1) are not allowed. 

The Bayes factors between the one-term ppE model 
and GR (red solid curve) and between the two-term ppE 
model and GR (blue dashed curve) are shown in Fig. [7] 
as a function of the injected value of (3-3 for a convergent 
(top-left panel), critical (top-right panel) and asymptotic 
(bottom panel) injection. These Bayes Factors are again 
calculated using the Savage-Dicke density ratio. Calcu- 
lating the posterior density at a fa = from a Markov 
chain involves counting the number of points in the chain 
that fall within the histogram bin containing fa = 0, and 
so the error bars reflect the counting error involved in 
this process, as well as the spread in BF values calcu- 
lated from multiple MCMC runs on the same signal but 
with different random seeds. Observe that the only in- 
jections for which two-term ppE templates consistently 
outperform one-term ppE templates are the critical ones. 
Even in this case, however, the preference is not large; 
the curves track each other very well in all cases. There- 
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Source a 4>l <f>c 


mi(M ) 


m 2 (M ) log(L>L)(Mpc) 


t c 8 6 L 


0-3 


P-2 


0-i 


Convergent 1.42 2.5 0.8 
Critical 1.42 2.5 0.8 
Asymptotic 1.42 2.5 0.8 


1.42 
1.52 
2.04 


1.73 3.83 

1.33 3.9 

1.34 3.86 


3.5 0.87 0.43 
3.5 0.87 0.43 
3.5 0.87 0.43 


0.003 
0.0006 
0.0007 


0.003 
0.018 
0.07 


0.003 
0.54 
7.0 



TABLE III: Source parameters for Figures [7] and [H] The /3(, values listed are for a particular case - the ratio between different 
Pb values was kept constant for each injected signal. The ratio for convergent was xl.0, critical was x30, and asymptotic was 
xlOO. 



fore, our results indicate that the one-term ppE templates 
are sufficient for searching for deviation from GR in GW 
data. 

Once a deviation from GR has been definitively de- 
tected, the next step is to learn as much about the signal 
as possible, in order to give theorists as much guidance 
as possible in their attempts to build an alternative the- 
ory of gravity. The information we could hope to extract 
from the type of analysis we have described in this paper 
is the structure of the series of phase corrections - do they 
enter at a certain PN power and then fade away? Or do 
they enter at that power and grow more important at 
higher orders in the expansion? Figure [8] plots the pos- 
terior distribution of the five models under consideration 
derived using a RJMCMC [13 analysis. In RJMCMC, 
moves arc proposed between models of different dimen- 
sionality according to the Metropolis-Hastings ratio: 

• J \ pW Y P( s \^Y)q(u Y ) m l , c , 
a = mm < 1, — |J| > (8) 

I pW x P( s \ X x)q(ux) J 

Here, model X and model Y differ by some number of 
parameters, q(u) is the distribution for random numbers 
chosen to generate the extra parameters, and |J| is the 
Jacobian of the two sets of parameters, which compen- 
sates for the difference in dimensionality. When using 
this Hastings ratio as an acceptance probability, we can 
allow our chains to explore the full space of allowed ppE 
models, both one- and two-term families, and use these 
to generate PDF's for the models themselves. The ratio 
of the heights of the PDF for model X and model Y is 
equal to the Bayes Factor between X and Y. 



Signal 0_3 <f)-2 4>-i 

Convergant 0.109 0.008 0.0005 
Critical 0.024 0.051 0.085 
Asymptotic 0.024 0.181 1.047 



TABLE IV: Number of useful cycles from the different in- 
jected ppE terms - Fig. 

To generate Figure [5J we have run a RJMCMC search 
on three different types of signals - one convergent, one 
critical, and one asymptotic - and plotted the number of 
iterations that the chains spent in each of the five differ- 
ent models. These five models include two ppE models 
with only one phase correction, (b = —3 and b = —2), two 
ppE models with two phase corrections, (b = —3+6 = —2 



and 6=— 2 + 6 = — 1), and GR. We find that, although 
there are some slight differences between the different 
models, in all cases we cannot draw meaningful distinc- 
tions between the different ppE models. The strongest 
Bayes Factor between two models is in the convergent 
case, where the Bayes Factor between the b = —3 only 
model and the b = — 2 only model is rs 5. While this 
does show some preference for the first model, it is not a 
strong preference, and so we would not want to use this 
result to draw conclusions about the underlying theory 
of gravity. In summary - even though these signals are 
clearly differentiable from GR (all have Bayes Factors of 
~ 100), the four different ppE models perform almost as 
well in fitting the signal. This means that if we hope 
to gain more information about the underlying nature of 
an alternative gravity theory, we would need higher SNR 
signals and/or multiple detections. On a more hopeful 
note, it means that our ability to detect a deviation from 
GR is not strongly dependent on which particular ppE 
template we choose to use in our analysis. 



V. FUTURE DIRECTIONS 

In this paper, we have investigated the effects of using 
more realistic non-GR injections to investigate our ability 
to test GR using GW signals. We have found that the 
inclusion of noise in our analysis does not significantly 
affect our results, but that the failure to include higher- 
order deviations from GR in the phase of the injected 
signal can bias them. We have also determined that one- 
parameter ppE template families are best for detecting 
deviations from GR, at least for the simple cases investi- 
gated here. 

The main direction of future work will be in determin- 
ing how analyzing more astrophysically realistic systems 
affects our ability to test GR. That is, systems that incor- 
porate not only more complicated deviations from GR, 
such as we examined in this paper, but that also include 
some of the messiness we know will exist in real systems 
in our universe. For instance, if we were analyzing sys- 
tems that merge within the aLIGO frequency band, we 
would need to include the merger and ringdown parts of 
the waveforms in our injections. If we then performed a 
Bayesian model selection between ppE and GR inspiral- 
only templates, it is entirely possible that the ppE tem- 
plates would win over the GR ones, simply by being able 
to fit more of the power in the non-inspiral signal. It is 
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One ppE Term 
Two ppE Terms 




0.001 0.002 0.003 0.004 0.005 0.006 0.007 

P-3 



One ppE Term — i — 
Two ppE Terms — ■»-■ 



0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 



One ppE Term 
Two ppE Terms 




FIG. 7: (Color Online) Bayes factors for one-term (solid red) and two-term (dashed blue) ppE templates for a convergent (top- 
left), critical (top-right) and asymptotic (bottom) ppE injection as a function of the injected value of /3-3. System parameters 
are listed in Table ITlTl and useful cycles of phase in Table HVl In the convergent and asymptotic cases, both models perform 
equally well at detecting a deviation from GR. In the critical case, the two-term model slightly out-performs the one-term 
model. 
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FIG. 8: Posterior distributions for the four different ppE mod- 
els, generated by RJMCMC. The top two panels show the 
distribution for a convergent injection, the middle two for a 
critical injection, and the bottom two for an asymptotic in- 
jection. All systems are NS-NS binaries with Bayes Factors 
of 100 favoring ppE over GR. System parameters are in Ta- 
ble Hill Model I has b = -3, model II has b = -2, model 
III has b = —3 and b — —2, and model IV has b = —2 and 
b = — 1. The y axis shows the percentage of iterations that 
the chain spent in each model, and the Bayes Factors between 
two models are simply the ratios of the percentages. Because 
the Bayes Factors are not large enough, these results indicate 
that we would not be able to make confident statements about 
the type of non-GR signal we had observed with this type of 
analysis. 
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